Disentangling the origins of viticulture in the western Mediterranean

We present direct evidence of early grape domestication in southern Italy via a multidisciplinary study of pip assemblage from one site, shedding new light on the spread of viticulture in the western Mediterranean during the Bronze Age. This consist of 55 waterlogged pips from Grotta di Pertosa, a Middle Bronze Age settlement in the south of the Italian peninsula. Direct radiocarbon dating of pips was carried out, confirming the chronological consistency of the samples with their archaeological contexts (ca. 1450–1200 BCE). The extraordinary state of conservation of the sample allowed to perform geometric morphometric (GMM) and paleogenetic analyses (aDNA) at the same time. The combination of the two methods has irrefutably shown the presence of domestic grapevines, together with wild ones, in Southern Italy during the Middle/Late Bronze Age. The results converge towards an oriental origin of the domestic grapes, most likely arriving from the Aegean area through the Mycenaeans. A parent/offspring kinship was also recognised between a domestic/wild hybrid individual and a domestic clonal group. This data point out a little known aspect of the diffusion of the first viticulture in Italy, and therefore in the western Mediterranean, which involved the hybridization between imported domestic varieties with, likely local, wild vines.


Figures S1 to S3
Tables S1 to S4 SI References

Supporting Information Text
Supplemental Methods DNA extraction.Ancient DNA from ten waterlogged grape pips from Grotta di Pertosa was recovered and analyzed following grapevine palaeogenomic methods developed by Ramos-Madrigal, et al. [1] (Fig. S2).Archaeological seeds were processed in a dedicated palaeogenomics facility at the University of York.The facility adheres to recommendations for ancient DNA research [2], including physical separation from post-PCR laboratories, positive air pressure, UV-C irradiation of benches and laminar flow cabinets, use of full-body suits by laboratory personnel, and frequent sterilization of surfaces.
Archaeological seeds were processed individually to recover a single genetic profile.Samples were bleached in 5% NaClO solution for 30 seconds to reduce modern DNA contamination and then rinsed in molecular biology grade water.Seeds were manually crushed with sterile micropestles and further homogenized in PowerBead tubes (Qiagen catalog number 13117-50) by oscillating in a Retsch Mixer Mill MM200 for 10 minutes at 25 Hz.
DNA was isolated from the specimens and two extraction controls using a protocol designed for archaeobotanical remains [3].The procedure consists of an overnight incubation in a lysis buffer, a phenol-chloroform DNA extraction, and a subsequent purification in a MinElute column (Qiagen catalog number 28004) with an optimized GuHCl binding buffer to retain ultrashort DNA molecules [4].The DNA eluates were slightly pigmented, potentially caused by the co-extraction of polyphenols from the pips or humic acids from the archaeological sediment.As these compounds may inhibit enzymatic reactions during library preparation or PCR amplification [5,6], the pigmented DNA extracts were further purified with OneStep PCR Inhibitor Removal columns (Zymo Research catalog number D6030).The manufacturer's recommendations were followed, except that the 30 μl DNA eluate added to the OneStep PCR Inhibitor Removal columns was below the suggested 50-200 μl input volume.DNA concentrations were measured with a Qubit dsDNA HS assay kit (Invitrogen catalog number Q32854).
Illumina library preparation.DNA was converted to Illumina libraries using the Santa Cruz Reaction (SCR) single-stranded DNA protocol for ancient DNA [7].SCR libraries were prepared in two rounds: screening libraries to estimate endogenous content and observe damage profiles and higher-complexity, damage-repaired libraries for enrichment of SNP loci.
The screening libraries were prepared on 5 μl of purified DNA (~17% of total extract volume).One SCR library control was also prepared on 5 μl of molecular grade water to monitor contamination.The amount of SCR adapter-splints and extreme thermostable single-stranded DNA binding proteins (ET SSB) were varied according to the amount of input DNA, following the suggested tiers in the SCR protocol.A real-time quantitative PCR assay was used to estimate the number of cycles needed for indexing PCR [8].The screening libraries were PCR-amplified in 25-μl reactions, containing 8-12 μl of library, 1AmpliTaq Gold buffer, 2.5 mM MgCl2, 200 μM dNTPs, 1 U AmpliTaq Gold 360 polymerase, 1 μM forward primer with 8-mer sample-specific index, 1 μM reverse primer with 8-mer sample-specific index, and 0.4 mg/mL bovine serum albumin.The PCR cycling conditions were 10 minutes at 95°C to denature DNA and activate the polymerase, 8-18 cycles of 95°C for 30 seconds, 60°C annealing for 30 seconds, and 72°C extension for 60 seconds, and a final extension of 72°C extension for 7 minutes.The amplified libraries were purified with homemade SPRI beads [9] and quantified with a TapeStation High Sensitivity D1000 assay (Agilent catalog number 5067-5584).The indexed libraries, including both extraction controls and the library control, were pooled and sequenced on a partial lane of an Illumina HiSeq 4000 in paired-end mode for 150 cycles at the Novogene facility in Sacramento, California.
The screening data demonstrated nine archaeological specimens contained more than 0.1% endogenous DNA and were progressed to the second phase of library preparation.The remaining raw DNA (25 μl) was treated with USER enzyme (New England BioLabs) to remove uracil residues, following the approach of a full uracil-DNA-glycosylase (UDG) treatment [10], with adaptations for the SCR protocol.The USER enzyme treatment was performed in 50-μl reactions containing 1× Cut Smart buffer and 3U USER enzyme, using an incubation of 37°C for three hours.The USERtreated DNA was purified in MinElute columns with the optimized GuHCl binding buffer to retain ultrashort DNA molecules [4].The DNA was eluted in 20 μl to match the maximum input volume of the SCR library protocol.The entire volume of USER-treated DNA was converted to SCR libraries as described previously.The indexing PCR steps were performed as above, except that the PCR reaction size was 100 μl, the amount of input library was 48 μl (the entire volume), and 8-10 PCR cycles were used.
DNA Enrichment of SNP loci.The high-complexity, damage-repaired SCR libraries were enriched using a custom myBaits target capture kit produced by Daicel Arbor Biosciences.The enrichment kit contains RNA probes to target 10,000 SNPs [1], allowing for the identification of cultivated varieties and analysis of population structure in wild and domesticated grapevine accessions [11].
The nine SCR libraries were enriched using three multiplex reactions, with each reaction containing three indexed libraries.The libraries were combined into three pools based upon similar endogenous content, with input volumes varied to reach similar amounts of input DNA.
Manufacturer's recommendations were followed for the enrichment and PCR steps, using myBaits manual (version 5.01) and the optional "High Sensitivity Protocol" for ancient DNA.For the blockers mix setup, the suggested set of blockers for plants were used: 5.0 μl of Block O and 0.5 μl of Block X.Two rounds of enrichment were used, with a hybridization temperature of 55°C during both overnight incubation periods.After the first enrichment, libraries were amplified 14 cycles, as per the manual, but following the second enrichment, libraries were amplified 10-14 cycles to reach the necessary amounts for sequencing (Fig. S2).The enriched libraries were quantified and sequenced on a portion of a partial lane of an Illumina HiSeq 4000 in paired-end mode for 150 cycles at the Novogene facility in Sacramento, California.
Reference datasets.Genotypes of modern grapevines in the GrapeReSeq genetic diversity panel provide the main context to interpret the Grotta di Pertosa archaeological samples.The GrapeReSeq panel consists of 783 domesticated cultivars and 112 wild grapevine accessions [11,12], each genotyped at >10,000 SNPs using Illumina chip technology.The genotype data were taken from the respective repositories and examined to confirm SNP coordinates in the grape nuclear reference genome 12×.v2 [13].Specifically, the metadata for the cultivar dataset (available at https://urgi.versailles.inra.fr/Species/Vitis/Data-Sequences/Genotyping-data)was used to localize Illumina chip SNPs by taking the flanking sequences and aligning them to the reference genome with BLAST+ (version 2.10.0)[14].SNPs were retained when they could be unambiguously aligned to the reference genome and matched the genetic positions listed in the metadata.Likewise, the PLINK files provided in the wild accession database were examined to confirm that the positions and alleles matched those in the cultivar database.After excluding SNPs which could not be unambiguously localized, the cultivar dataset contained 8773 SNPs and the combined cultivar and wild dataset contained 8191 SNPs.
In addition to the modern grapevine accessions, the Grotta di Pertosa samples were compared to two published archaeological datasets which utilized enrichment with the custom myBaits insolution kit.The first archaeological dataset consists of twenty-eight grape seeds excavated in France dating to the Iron Age to medieval period [1] and the second consists of three seeds excavated from Georgia and dating to the past 600 years [15].
The libraries which had been enriched for SNP loci were used in population and relatedness analyses.The UDG treatment removed damaged nucleotides from the center of the DNA molecules, but damage was still present (>0.01 frequency) in the first and last three nucleotides.To reduce the impact of damage in analyses, the trimBam function of BamUtils (version 1.0.14)[21] was used to trim three nucleotides from the 5' and 3' end of each read.
The raw data from the published datasets from French [1] and Georgian [15] archaeological samples were processed following the same methods as those used on the Grotta di Pertosa data, except for the steps to correct DNA damage.As previous projects did not implement UDG treatment, the damage patterns were corrected using two steps.First, the base qualities of likely damaged positions were rescaled using mapDamage.Subsequently, five nucleotides were trimmed from the 5' and 3' end of each read using BamUtils.The depth of coverage on the GrapeReSeq SNP loci was calculated with SAMtools (version 1.10) [22] after filtering PCR duplicates, removing poorly mapped reads, and correcting for damage.
Ramos-Madrigal, et al. [1] demonstrate that archaeological seeds contain primarily maternal DNA, i.e., the genotype of the vine on which the berry was growing.Paternal DNA is present in fresh seeds in the endosperm and embryo, but this paternal contribution has been observed to be <20% in other archaeological seeds and can be excluded to infer the maternal genotype at many loci [1].The amount of paternal DNA was investigated here by examining the relative frequency of the reference allele at SNP loci where both alleles were observed.GATK's HaplotypeCaller (version 4.1.3)was used to calculate the number of reads corresponding to the reference and alternative alleles.The paternal DNA analysis was restricted to loci with a minimum depth of coverage of ten reads in R (version 4.2.0)[23].
Population structure was examined using a principal components analysis implemented in the smartpca function of the EIGENSOFT (version 7.2.1)package [24,25].This approach uses psudeo-haploid genomes to accommodate samples with highly variable sequencing depths and missing data.Mapped reads for the enriched libraries were randomly sampled at the GrapeReSeq SNP loci using SAMtools and pileupCaller (https://github.com/stschiff/sequenceTools),using a minimum base quality of 30, minimum mapping quality of 30, and disabling the base alignment quality computation.The files were merged with pseudo-haploid versions of the reference panel using EIGENSOFT.The archaeological samples were projected onto the modern reference panel in smartpca using the autoshrink mode.PCA data were processed in R (version 4.2.0)[23] using the ggplot2 package [26].
Genotype calling was performed on well-covered sites that should be minimally impacted by limited paternal DNA contributions.A HaplotypeCaller GVCF file for each sample was processed using BCFtools (version 1.15) [22], setting genotypes to missing data if they had a depth of coverage less than ten reads or unusually low or high frequencies of the alternative allele (0<AF<0.2 or 0.8<AF<1).That is, well-covered loci with exclusively one allele were genotyped as homozygous and putative heterozygous sites were genotyped as heterozygous if the frequency of the alternative allele was 0.2-0.8.While this approach discards data some potential true heterozygous sites, the retained data are of a higher quality.It is also worth noting that a small proportion of sites which are called as homozygous are likely heterozygous, due to the random sampling of alleles at a relatively low depth of coverage.For SNP sites with 10× coverage, the binomial distribution estimates that 0.195% of true heterozygous sites would be genotyped as homozygous.Still, the absolute number of these genotyping errors are expected to be low; for example, for modern samples in the GrapeReSeq dataset (mean heterozygosity rate of 0.32), this would correspond to approximately six heterozygous loci genotyped as homozygous in the 10,000 SNP panel.Furthermore, the kinship software used in this project was selected to accommodate low levels of error (e.g., enrichment biases, sequencing errors, genotyping errors on the Illumina SNP chip).
Kinship between samples and the GrapeReSeq panel was investigated using two methods: one approach with called genotypes and a second method using genotype likelihoods to accommodate samples with lower coverage.In the first approach, genotypes of individual archaeological samples were merged with the GrapeReSeq panel using PLINK (version 1.9) [27] and processed in KING (version 2.2.8) [28].In the second approach, genotype likelihoods for the SNP loci were calculated in ANGSD (version 0.930) [29], using the SAMtools genotype likelihood model, a minimum mapping quality of 30, and a minimum base quality of 20.The genotype likelihoods were analyzed in ngsRelate (version 2) [30], assuming outbred individuals and using the allele frequencies of the GrapeReSeq panel.For both the genotype calls and genotype likelihoods, relationships between pairs of samples were identified using the KING kinship coefficient and the proportion of loci where no alleles are shared (zero alleles identical by state: IBS0).The classification of relationships followed previous suggestions to distinguish identical clones (K≥0.49 and IBS0≤0.001),parentoffspring relationships (0.177<K<0.354and IBS0≤0.001),highly related individuals (0.177<K<0.354and IBS0≤0.25),and unrelated individuals [1,11].The "highly related" category is effectively analogous to sibling relationships, but simulations by Ramos-Madrigal, et al. [1] demonstrated that pollination by a closely related vine could impact interpretations if archaeological seeds contain high amounts of paternal DNA contributions.
To further examine genetic distances between archaeological samples and modern accessions, PLINK was used to calculate identity by state (IBS).As several samples were identified as clonal, the data for each clone was merged to increase the number of SNP loci which could be genotyped.Genotype calling was performed on the merged BAM files as previously described.Archaeological samples were included in the analysis if they were genotyped for at least 2000 loci.

Figure S2
. DNA degradation patterns.A. Nucleotide misincorporations at the 5' and 3' ends of mapped reads were identified using mapDamage (20).Shotgun sequencing yielded characteristic patterns of apparent C-to-T transitions at both ends of the reads, consistent with libraries prepared on single-stranded DNA with deaminated cytosine residues (31).The libraries used in the SNP enrichment experiments were prepared on DNA treated with uracil-DNA-glycosylase (UDG) to repair damage.The enriched libraires demonstrated a reduction in damage patterns, particularly in the center of the reads, and remaining damage patterns in the terminal three nucleotides were corrected bioinformatically.As expected for single-stranded DNA libraries, few G-to-A misincorporations are observed.B. DNA length distributions for both sequencing experiments.The shotgun sequencing yielded shorter reads than the enrichment experiment (mean of 40.3 bp and 51.8 bp, respectively).Specimen P58 was not enriched for SNPs as it yielded negligible amounts of endogenous DNA and a damage profile which was not consistent with degraded DNA.

Figure S3
. Evidence of paternal DNA.Histograms show the proportions of reads with the reference allele at SNP loci sequenced to a depth of at least 10×.The top panels show all SNPs, including putative homozygous sites, while the bottom panels are restricted to sites where the reference and alternative alleles were observed.Most seeds have evidence of limited contributions of paternal DNA, as seen in the small peak of heterozygous loci where 5-15% of reads contain the alternative allele.This scenario can be interpreted as loci where the maternal genotype is homozygous for the alternative variant and the paternal DNA contains at least one copy of the reference allele.Paternal DNA is likely also responsible for shifting modes above 50%; these loci are consistent with a heterozygous maternal genotype and ~10% paternal DNA with a homozygous reference genotype.Given the evidence for paternal DNA contributions, genotyping was conducted in a conservative manner, following Ramos-Madrigal, et al. (1).Specifically, potential heterozygous loci were recorded as missing data if they contained <20% or >80% of one allele (red).Homozygous loci and well-supported heterozygous loci were genotyped and included in downstream relatedness analyses (blue).All read depths were calculated after removing PCR duplicates and reads with low mapping qualities.

Table S1 .
Bronze Age Vitis vinifera seeds found in southern Italy.

Table S2 .
Radiocarbon datings performed on seeds from the same stratigraphic unit.

Table S3 .
GMM.The table shows the results of the LDA performed on the archaeological sample with different probability thresholds, above; the model error estimated on the basis of the allocation to domestic and wild groups of known modern specimens from the reference collection, in the middle part; the comparison between the model error and the proportion of domesticated pips in the archaeological sample, at the bottom.

Table S4 .
Results of the LDA performed at cultivar level on the 9 pips identified as domesticated.Cases with p value over 0.75 were highlighted.